Your lab/homework must be submitted in Moodle with two files: (1) R Markdown format file (Rmd); (2) an html file. Other formats will not be accepted. Your responses must be supported by both textual explanations and the code you generate to produce your result.

Part 1

Your first task is to recreate the animated gapminder plot for the full set of years.

First, use the following to read the three following data into R:

gdp_per_cap <- 
  read.csv(
    "income_per_person_gdppercapita_ppp_inflation_adjusted.csv", 
    header = TRUE, 
    stringsAsFactors = FALSE,
    check.names = FALSE
    )
life_exp <- 
  read.csv(
    "life_expectancy_years.csv",
    header = TRUE,
    stringsAsFactors = FALSE,
    check.names = FALSE
    )
pop <-
  read.csv(
    "population_total.csv",
    header = TRUE,
    stringsAsFactors = FALSE,
    check.names = FALSE
  )
  1. [70 pt] Create a single tidy dataframe that includes all the data you need to recreate the plot for all the years up to and including 2020. The structure of your data should be similar to that of the gapminder dataset that is provided in the gapminder package:
data(gapminder, package = "gapminder")
head(gapminder)
##       country continent year lifeExp      pop gdpPercap
## 1 Afghanistan      Asia 1952  28.801  8425333  779.4453
## 2 Afghanistan      Asia 1957  30.332  9240934  820.8530
## 3 Afghanistan      Asia 1962  31.997 10267083  853.1007
## 4 Afghanistan      Asia 1967  34.020 11537966  836.1971
## 5 Afghanistan      Asia 1972  36.088 13079460  739.9811
## 6 Afghanistan      Asia 1977  38.438 14880372  786.1134

Use the code from the introductory presentation to generate the animated plot for the entire period.

Hints: i. Make sure to define your variables in the correct data types (strings, characters, numeric etc.). ii. Consider using the countrycode::countrycode (that is, the countrycode function from the countrycode package)

A solution:

library(tidyverse)
## ── Attaching packages ────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.3     ✓ purrr   0.3.4
## ✓ tibble  3.0.5     ✓ dplyr   1.0.3
## ✓ tidyr   1.1.2     ✓ stringr 1.4.0
## ✓ readr   1.3.1     ✓ forcats 0.4.0
## ── Conflicts ───────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
gdp_per_cap_long <- 
  gdp_per_cap %>% 
    pivot_longer(setdiff(colnames(gdp_per_cap), "country"), names_to = "Year", values_to = "GDP_per_cap")

life_exp_long <- 
  life_exp %>%
    pivot_longer(setdiff(colnames(life_exp), "country"), names_to = "Year", values_to = "Life_Expectancy")

pop_long <-
  pop %>%
    pivot_longer(setdiff(colnames(life_exp), "country"), names_to = "Year", values_to = "Population")
  
library(countrycode)
gdp_life_pop <- 
  left_join(gdp_per_cap_long, life_exp_long, by = c("country", "Year")) %>%
  left_join(pop_long, by = c("country", "Year")) %>%
  drop_na() %>%
  mutate(Continent = countrycode(sourcevar = country, origin = "country.name", destination = "continent")) %>%
  mutate(Year = as.numeric(Year)) %>%
  filter(Year <= 2020)
  
library(plotly)
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
gg <- ggplot(gdp_life_pop, aes(GDP_per_cap, Life_Expectancy, color = Continent)) +
  geom_point(aes(size = Population, frame = Year, ids = country)) +
  scale_x_log10()
## Warning: Ignoring unknown aesthetics: frame, ids
ggplotly(gg) %>% animation_opts(frame = 100)
  1. [30 pt] Create a similar graph, but one that depicts on the y-axis the life expectancy of each country relative to the life expectancy in the US (for every given year). That is, the y-axis will present data like (life expectancy for Cuba in 1919) / (life expectancy for the US in 1919), (life expectancy for Cuba in 1920) / (life expectancy for the US in 1920), (life expectancy for Iraq in 1919) / (life expectancy for the US in 1919), (life expectancy for Iraq in 1920) / (life expectancy for the US in 1920), etc.

Hint: if you do this correctly, the point that represents the US should always have a y-value of 1.

A solution:

# Relative to the US:
gdp_life_pop <- 
  gdp_life_pop %>%
  group_by(Year) %>%
  mutate(relUS = Life_Expectancy / Life_Expectancy[country == "United States"])
  
gg <- ggplot(gdp_life_pop, aes(GDP_per_cap, relUS, color = Continent)) +
  geom_point(aes(size = Population, frame = Year, ids = country)) +
  geom_point(data = gdp_life_pop %>% filter(country == "United States"), aes(size = Population, frame = Year, ids = country), shape = 1, col = "black") +
  scale_x_log10()
## Warning: Ignoring unknown aesthetics: frame, ids

## Warning: Ignoring unknown aesthetics: frame, ids
ggplotly(gg) %>% animation_opts(frame = 100)
  1. [20 pt Extra Credit] Add or manipulate any new variables and create another plot (based on a similar template) that will help you tell a story you think is interesting and perhaps obscured by the way the data and plot is currently orgainzed. If you decide to add additional data from other sources, you can only do this for a partial period, but if you do so make sure that you cover at least 40 years.